Prior to start

This section performs spacial and statistical analysis on the data of Manhattan. In the analysis, the gross_income_per_sq_ft is treated as the indicator of rental price. Since the COVID-19 pandemic had great influence on the property market, we mainly focus on reports after 2019.

From the prospective of tenant rather than property owner, people cares about how to make the most cost-effective choice and rent a satisfactory room with lowest price. This page are intended to provide helpful information.


Overview

# load packages and data
library(tidyverse)
library(plotly)

load("E:/2023_autumn/DataScience/Final_Proj/data/cleaned_data.RData")
load("E:/2023_autumn/DataScience/Final_Proj/data/Manhattan_latlonmap.RData")


transformed_rental_income=
  transformed_rental_income %>%
  separate(boro_block_lot, into = c("borough", "block", "lot"), sep = "-") %>%
  mutate(
  borough = case_when(
    borough == "1" ~ "Manhattan",
    borough == "2" ~ "Bronx",
    borough == "3" ~ "Brooklyn",
    borough == "4" ~ "Queens",
    borough == "5" ~ "Staten Island")
  ) %>% 
  filter(
    borough == "Manhattan"
  )

Manhattan_data = Manhattan_data %>%
  separate(address, into = c("address", "toremove"), sep = ",") %>%
  select(-toremove)

transformed_rental_income = 
  left_join(transformed_rental_income, Manhattan_data, by = "address")

## Columbia university lat lon 40.80754 -73.96257
## Hammer health science building lat lon 40.84252 -73.94243 

for_map =  transformed_rental_income %>% 
  filter(report_year > 2019) %>%
  select(-c(latitude,longitude)) 

income_sqft_M = for_map %>%
  plot_mapbox(
    lon = ~lon,
    lat = ~lat,
    color = ~gross_income_per_sq_ft,
    mode = 'markers',
    alpha = 0.5,
    hovertext = ~gross_income_per_sq_ft,
    hoverinfo = 'text',
    colors = "Spectral"
  ) %>%
  layout(
    mapbox = list(
      style = "streets",
      zoom = 10,
      center = list(lat = 40.81, lon = -73.96)
    )
  )

yearsbuild_M = for_map %>%
  plot_mapbox(
    lon = ~lon,
    lat = ~lat,
    color = ~year_built,
    mode = 'markers',
    alpha = 0.5,
    hovertext = ~year_built,
    hoverinfo = 'text',
    colors = c('red','yellow','green')
  ) %>%
  layout(
    mapbox = list(
      style = "streets",
      zoom = 10,
      center = list(lat = 40.81, lon = -73.96)
    )
  )

elevator_M = for_map %>%
  plot_mapbox(
    lon = ~lon,
    lat = ~lat,
    color = ~building_classification,
    mode = 'markers',
    alpha = 1,
    hovertext = ~building_classification,
    hoverinfo = 'text'
  ) %>%
  layout(
    mapbox = list(
      style = "streets",
      zoom = 10,
      center = list(lat = 40.81, lon = -73.96)
    )
  )


subplot(income_sqft_M, yearsbuild_M,elevator_M, nrows = 1, margin = 0.01) %>% layout(showlegend = FALSE, title = 'Income/sqft - Year built - Classification')

When considering rent an apartment, people may refers to the location, building’s lifespan, elevator facilities and rental price. The figure above plotted the Income/sqft, Year built and Elevator info according to building’s location. The three plots are in same size and version, providing a comparable overview.

Rental Price

income_sqft_M

The rental price showed great difference from the north edge to south edge of Manhattan. Rentals at upper town that close to Bronx are the cheapest. As the street numbers get lower, the rental price increase and reaches its peak at west village. There is less difference between the rents at west and east side of Manhattan. But specially, rents at the lowers east side are significantly lower than surrondings.

Years built

yearsbuild_M

Generally, the building at west side are older than those at east side of Manhattan. There are more new buildings in midtown and lower Manhattan than in uptown Manhattan.

Elevator Facilities

elevator_M
  • D0 – Elevator Co-op; Conversion From Loft/Warehouse
  • D1 – Elevator Apt; Semi-fireproof (Without Stores)
  • D2 – Elevator Apt; Artists in Residence
  • D3 – Elevator Apt; Fireproof (Standard Construction Without Stores)
  • D4 – Elevator Cooperatives
  • D5 – Elevator Apt; Converted
  • D6 – Elevator Apt; Fireproof – With Stores
  • D7 – Elevator Apt; Semi-Fireproof With Stores
  • D8 – Elevator Apt; Luxury Type
  • D9 – Elevator Apt; Miscellaneous

Most of the buildings in Manhattan are equipped with Elevator, while there are still groups of walk-up buildings at uptown near Bronx, midtown west and lower Manhattan. This may closly related to their year built.


Statistical Analysis

Then we are curious about whether the factors(Years built and Elevator Facilities) are truly related to the rental price. We choose a hypothesis test approach to explore their relationship.

One Way ANOVA on Elevator Facilities

for_map %>%
  group_by(building_classification) %>%
  summarise(
    n = n(),
    mean = mean(gross_income_per_sq_ft),
    sum = sum(gross_income_per_sq_ft),
    variance = var(gross_income_per_sq_ft)
  )
## # A tibble: 17 × 5
##    building_classification     n  mean      sum variance
##    <chr>                   <int> <dbl>    <dbl>    <dbl>
##  1 C1-WALK-UP               1476  33.2  49058.     196. 
##  2 C4-WALK-UP               1195  40.8  48767.     144. 
##  3 C5-WALK-UP                366  46.6  17051.      77.3
##  4 C6-WALK-UP               1410  37.7  53185.     183. 
##  5 C7-WALK-UP                960  37.6  36140.     228. 
##  6 C8-WALK-UP                  1  38.0     38.0     NA  
##  7 C9-WALK-UP                  1  55.4     55.4     NA  
##  8 D0-ELEVATOR               265  46.3  12261.      35.4
##  9 D1-ELEVATOR              2553  36.0  91896.     148. 
## 10 D2-ELEVATOR                65  36.0   2342.      69.3
## 11 D3-ELEVATOR              1861  38.7  71938.      81.7
## 12 D4-ELEVATOR              3160  40.4 127553.     110. 
## 13 D5-ELEVATOR               463  45.8  21203.      49.0
## 14 D6-ELEVATOR              2724  45.9 125110.      62.5
## 15 D7-ELEVATOR              1070  39.4  42170      141. 
## 16 D8-ELEVATOR               307  45.5  13958.      59.9
## 17 D9-ELEVATOR               619  43.3  26800.      95.9
for_map %>%
  ggplot(aes(x =building_classification, y = gross_income_per_sq_ft,fill =building_classification))+
  geom_boxplot()+
  theme(
        axis.text.x = element_text(angle = 90)
      )+
  guides(fill=F)

According to the box plot, we have few records on C8-WALK-UP and C9-WALK-UP, so we filtered them out in the anova test.

for_anova = for_map %>%
  filter(
  building_classification!= "C8-WALK-UP",
  building_classification!= "C9-WALK-UP"
  )%>%
  select(c(building_classification,gross_income_per_sq_ft))
  
res = lm(gross_income_per_sq_ft ~ factor(building_classification), data = for_anova)
anova(res)
## Analysis of Variance Table
## 
## Response: gross_income_per_sq_ft
##                                    Df  Sum Sq Mean Sq F value    Pr(>F)    
## factor(building_classification)    14  280558 20039.9  162.64 < 2.2e-16 ***
## Residuals                       18479 2276904   123.2                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

In this one way anova test, our null hypothesis is that the average price of each kinds of building are the same. Given a p-value much smaller than 0.05, we reject the null hypothesis and conclude that there are differences in the average price of each kinds of building. This indicates that the building classification is related to the rental price.

The detailed relationship across each types can be further checked by paired t-test.

SLR on Years built

for_map %>%
  ggplot(aes(x = gross_income_per_sq_ft))+
  geom_histogram()

fit1 = lm(gross_income_per_sq_ft ~ year_built, data = for_map)
summary(fit1)
## 
## Call:
## lm(formula = gross_income_per_sq_ft ~ year_built, data = for_map)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -32.293  -6.901   2.249   7.864  42.745 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -22.301489   6.192131  -3.602 0.000317 ***
## year_built    0.032274   0.003208  10.060  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 11.73 on 18494 degrees of freedom
## Multiple R-squared:  0.005442,   Adjusted R-squared:  0.005388 
## F-statistic: 101.2 on 1 and 18494 DF,  p-value: < 2.2e-16
plot(fit1,which = 1)

plot(fit1,which = 2)

We applied SLR to test the significance of the years built. Given the null hypothesis that the coefficient of years built term is 0, p-value is small and we can reject the null. However, the estimated coefficient is small and the price seems not normally distributed, which may lead to inaccurate result.


For Columbia Students

What about the price near Columbia university and Columbia university Irving medical center?

This map may help you to choose your apartment!

latCU = 40.80754
lonCU = -73.96257
latCUIMC = 40.84252
lonCUIMC = -73.94243

near_columbia = for_map %>%
  filter(
    lat >  latCU - 0.01,
    lat < latCU + 0.01,
    lon > lonCU -0.005,
    lon < lonCU + 0.005
    ) %>%
  plot_mapbox(
    lon = ~lon,
    lat = ~lat,
    color = ~gross_income_per_sq_ft,
    mode = 'markers',
    alpha = 1,
    hovertext = ~address,
    text = ~gross_income_per_sq_ft,
    meta = ~building_classification,
    hovertemplate = "%{hovertext} <br> %{meta} <br>gross_income_per_sq_ft: %{text}",
    hoverinfo = 'text',
    colors = "Spectral"
  )  %>%
  layout(
    mapbox = list(
      style = "streets",
      zoom = 13,
      center = list(lat = latCU, lon = lonCU)
    )
  )


near_CUIMC = for_map %>% 
  filter(
    lat >  latCUIMC - 0.01,
    lat < latCUIMC + 0.01,
    lon > lonCUIMC -0.005,
    lon < lonCUIMC + 0.005    
  )%>%
  plot_mapbox(
    lon = ~lon,
    lat = ~lat,
    color = ~gross_income_per_sq_ft,
    mode = 'markers',
    alpha = 1,
    hovertext = ~address,
    text = ~gross_income_per_sq_ft,
    meta = ~building_classification,
    hovertemplate = "%{hovertext} <br> %{meta} <br>gross_income_per_sq_ft: %{text}",
    hoverinfo = 'text',
    colors = "Spectral"
  ) %>%
  layout(
    mapbox = list(
      style = "streets",
      zoom = 13,
      center = list(lat = latCUIMC, lon = lonCUIMC)
    )
  )

subplot(near_columbia, near_CUIMC, nrows = 1, titleY = TRUE, titleX = TRUE, margin = 0.01)